Adding speaking style to a TTS system
نویسندگان
چکیده
This paper aims to enhance the performance of a TTS system by generating various speaking styles. First we describe three speaking styles (Radio News, Political Address and Conversation) and compare the prosodic features found in these authentic styles with the prosody in “neutral” speech uttered by the eLite TTS system ([1]). Differences concern about 20 prosodic characteristics (F0 span, speech rate, pauses and hesitation, primary and secondary accentuation, schwa deletion, etc.). In order to make the neutral speech similar to a typical speaking style, prosodic characteristics are implemented within the TTS system itself or during a postprocessing step. The quality of the “stylized” synthesis is evaluated by comparing it to the original style.
منابع مشابه
A Model for Varying Speaking Style in TTS systems
This paper aims to enhance the performance of a TTS system by generating various speaking styles. First we describe three speaking styles (Radio News, Political Address and Conversation) and compare the prosodic features found in these authentic styles with the prosody in “neutral” speech uttered by the eLite TTS system ([1]). Differences concern about 20 prosodic characteristics (F0 span, spee...
متن کاملA Corpus-based Approach to <ahem/> Expressive Speech Synthesis
Human speech communication can be thought of as comprising two channels – the words themselves, and the style in which they are spoken. Each of these channels carries information. Today's most-advanced text-to-speech (TTS) systems such as [1],[2],[3],[4] fall far short of human speech because they offer only a single, fixed style of delivery, independent of the message. In this paper, we descri...
متن کاملDevelopment of a genre-dependent TTS system with cross-speaker speaking-style transplantation
One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identifi...
متن کاملAutomatic prosodic modeling for speaker and task adaptation in text-to-speech
One of the most important demands for future TTS systems is their ability to improve naturalness when embedded in a particular task or application that requires a particular speaking style for a particular speaker. In this paper, we present a new prosodic modeling procedure for improving naturalness by adapting a TTS system to a new speaker and a new speaking style. The proposed procedure is an...
متن کاملJapanese pitch conversion for voice morphing based on differential modeling
In this paper, we convert the pitch contours predicted by a TTS system that models a source speaker to resemble the pitch contours of a target speaker. When the speaking styles of the speakers are very different, complex conversions such as adding or deleting pitch peaks may be required. Our method does the conversions by modeling the direct pitch features and differential pitch features at the...
متن کامل